[5525939] Allow user to select target opset in MOQ#809
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis change introduces opset version control to ONNX quantization, allowing users to specify target ONNX opset versions via CLI Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Quantizer as quantize()
participant Preprocessor as _preprocess_onnx()
participant OpsetUtil as Opset Utils
participant Converter as convert_to_f16()
participant Model as ONNX Model
User->>Quantizer: quantize(model, opset=X)
Quantizer->>Preprocessor: _preprocess_onnx(model, opset=X)
Preprocessor->>OpsetUtil: get_qdq_precisions(model)
OpsetUtil-->>Preprocessor: precision_set
Preprocessor->>OpsetUtil: get_min_opset_for_precisions(precision_set)
OpsetUtil-->>Preprocessor: mode_min_opset
alt opset provided
Preprocessor->>Preprocessor: validate opset vs mode_min_opset
alt opset < mode_min_opset
Preprocessor->>Preprocessor: warn & upgrade to mode_min_opset
end
else opset not provided
Preprocessor->>Preprocessor: select max(original_opset, mode_min_opset)
end
Preprocessor->>Model: convert to target_opset
Preprocessor-->>Quantizer: preprocessed_model
Quantizer->>Converter: convert_to_f16(model, opset=target_opset)
Converter->>OpsetUtil: get_qdq_precisions(model)
Converter->>OpsetUtil: get_min_opset_for_precisions(precision_set)
Converter->>Converter: compute min_opset with validation
Converter-->>Quantizer: converted_model
Quantizer-->>User: quantized_model
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/quantize.py (1)
440-521: Pass the effective target opset downstream.
_preprocess_onnxmay upgrade the model's opset viaonnx.version_converter.convert_version(), butquantize_int8andquantize_fp8receive the raw useropsetinput. This creates a mismatch: the quantizers'opsetparameter may diverge from the actual model opset after preprocessing. Passget_opset_version(onnx_model)(already imported) instead.🔧 Suggested fix
) = _preprocess_onnx( onnx_path, use_external_data_format, output_path, @@ simplify, quantize_mode, opset, ) + target_opset = get_opset_version(onnx_model) @@ - opset=opset, + opset=target_opset, **kwargs, )
🤖 Fix all issues with AI agents
In `@modelopt/onnx/autocast/convert.py`:
- Around line 222-235: The docstring for the conversion routine is out of sync
with the implementation: low_precision_type uses a base_min_opset of 19 for
"fp16" (see base_min_opset and low_precision_type in convert.py) but the
docstring still claims 13; update the docstring to state that the default
minimum opset for fp16 is 19 (and bf16 is 22) and keep the note that Q/DQ nodes
may require increasing the opset (e.g., FP8/INT4/NVFP4) so the documentation
matches the logic in get_qdq_precisions, get_min_opset_for_precisions and the
base_min_opset assignment.
In `@modelopt/onnx/quantization/__main__.py`:
- Around line 289-297: Update the help text for the "--opset" argument added via
argparser.add_argument to include bf16 minimum opset info: mention that
--high_precision_dtype bf16 may require opset 22 (in addition to the existing
note about 19 for fp16, 21 for int4, 23 for nvfp4), and make the same change to
the other duplicate "--opset" help string found elsewhere in this module; ensure
the message clearly states that opset may be automatically increased if required
by operations.
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 124-130: The min-opset lookup uses exact keys so variants like
"int4_awq" fall back to BASE_MIN_OPSET; normalize quantize_mode before querying
QDQ_PRECISION_MIN_OPSET: compute a normalized_mode (e.g., if "int4" in
quantize_mode -> "int4", if "nvfp4" in quantize_mode or "float4" variant ->
"float4_e2m1fn", etc.), then use QDQ_PRECISION_MIN_OPSET.get(normalized_mode,
BASE_MIN_OPSET) when setting mode_min_opset; update references in quantize.py
that use quantize_mode (including the substring checks and the get_opset_version
flow) to use the normalized value so variants resolve to the correct minimum
opset.
In `@modelopt/onnx/utils.py`:
- Around line 702-731: get_qdq_precisions currently only inspects
DequantizeLinear nodes with Constant inputs and misses QuantizeLinear nodes and
non-constant/Variable paths (activations), causing under-reporting of
precisions; update get_qdq_precisions to also iterate QuantizeLinear nodes and
extract precision from their output_dtype attribute where present, and for both
QuantizeLinear and DequantizeLinear handle Variable inputs by resolving the
tensor type via the graph/model value_info or node.output type (e.g., check
graph.value_info / model.graph.value_info / model.graph.input/output types for
the corresponding tensor and use its elem_type/name), while still keeping the
existing Constant-path logic (Constant.values.dtype.name) and preserving
detection of custom nodes like TRT_FP4DynamicQuantize.
🧹 Nitpick comments (1)
tests/unit/onnx/test_quantize_api.py (1)
29-34: Avoid hard‑coded min‑opset duplication in tests.To reduce drift, consider deriving these values from the production constants.
♻️ Suggested refactor
-from modelopt.onnx.utils import get_opset_version +from modelopt.onnx.utils import BASE_MIN_OPSET, QDQ_PRECISION_MIN_OPSET, get_opset_version @@ MIN_OPSET = { - "int8": 19, - "fp8": 19, - "int4": 21, + "int8": BASE_MIN_OPSET, + "fp8": BASE_MIN_OPSET, + "int4": QDQ_PRECISION_MIN_OPSET["int4"], }
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #809 +/- ##
==========================================
+ Coverage 73.33% 74.02% +0.69%
==========================================
Files 192 192
Lines 19613 19664 +51
==========================================
+ Hits 14383 14557 +174
+ Misses 5230 5107 -123 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9882089 to
8e80cee
Compare
a864ab4 to
8cdbe62
Compare
6f04093 to
08f961f
Compare
08f961f to
acc8939
Compare
|
Tests failing - depends on #819 to fix CI |
acc8939 to
3beb294
Compare
- Allow user to select the target opset - Minimum opset will be defined according to quantization mode - Upgrade onnxruntime to 1.23.0 to support nvfp4 - Add tests in tests/unit/onnx/test_quantize_api.py Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
3beb294 to
94e574b
Compare
## What does this PR do? **Type of change:** new feature **Overview:** - Allow user to select the target opset - Minimum opset will be defined according to quantization mode - Add tests in tests/unit/onnx/test_quantize_api.py ## Testing Added unit tests tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8] PASSED [ 11%] tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8] PASSED [ 22%] tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4] PASSED [ 33%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8] PASSED [ 44%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8] PASSED [ 55%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4] PASSED [ 66%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8] PASSED [ 77%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8] PASSED [ 88%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4] PASSED [100%] ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - auto update according to argparser help - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes ## Additional Information Requested as a WAR for a Windows-onnxruntime issue in 5525939, but regardless, it's a useful feature to have <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `--opset` CLI option enabling users to specify target ONNX opset version when quantizing models. * Automatic validation ensures the opset version is compatible with quantization requirements, with warnings when adjustments are made. * **Tests** * Added comprehensive test coverage for opset version handling across quantization workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
## What does this PR do? **Type of change:** new feature **Overview:** - Allow user to select the target opset - Minimum opset will be defined according to quantization mode - Add tests in tests/unit/onnx/test_quantize_api.py ## Testing Added unit tests tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8] PASSED [ 11%] tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8] PASSED [ 22%] tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4] PASSED [ 33%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8] PASSED [ 44%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8] PASSED [ 55%] tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4] PASSED [ 66%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8] PASSED [ 77%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8] PASSED [ 88%] tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4] PASSED [100%] ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - auto update according to argparser help - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes ## Additional Information Requested as a WAR for a Windows-onnxruntime issue in 5525939, but regardless, it's a useful feature to have <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `--opset` CLI option enabling users to specify target ONNX opset version when quantizing models. * Automatic validation ensures the opset version is compatible with quantization requirements, with warnings when adjustments are made. * **Tests** * Added comprehensive test coverage for opset version handling across quantization workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Type of change: new feature
Overview:
Testing
Added unit tests
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8] PASSED [ 11%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8] PASSED [ 22%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4] PASSED [ 33%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8] PASSED [ 44%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8] PASSED [ 55%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4] PASSED [ 66%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8] PASSED [ 77%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8] PASSED [ 88%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4] PASSED [100%]
Before your PR is "Ready for review"
Additional Information
Requested as a WAR for a Windows-onnxruntime issue in 5525939, but regardless, it's a useful feature to have
Summary by CodeRabbit
New Features
--opsetCLI option enabling users to specify target ONNX opset version when quantizing models.Tests
✏️ Tip: You can customize this high-level summary in your review settings.